Fuzzy Perceptive Values for MDPs with Discounting

نویسندگان

  • Masami Kurano
  • Masami Yasuda
  • Jun-ichi Nakagami
  • Yuji Yoshida
چکیده

In this paper, we formulate the fuzzy perceptive model for discounted Markov decision processes in which the perception for transition probabilities is described by fuzzy sets. The optimal expected reward, called a fuzzy perceptive value, is characterized and calculated by a new fuzzy relation. As a numerical example, a machine maintenance problem is considered.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fuzzy optimality relation for perceptive MDPs - the average case

This paper is a sequel to Kurano et al [9], [10], in which the fuzzy perceptive models for optimal stopping or discounted Markov decision process is given. We proposed a method of computing the corresponding fuzzy perceptive values. Here, we deal with the average case for Markov decision processes with fuzzy perceptive transition matrices and characterize the optimal average expected reward, ca...

متن کامل

Fuzzy Optimality Equations for Perceptive MDPs

This paper is a sequel to Kurano et al [9], [10], in which the fuzzy perceptive models for optimal stopping or discounted Markov decision process are proposed and the methods of computing the corresponding fuzzy perceptive values are given. Here, we deal with the average case for Markov decisin processes with fuzzy perceptive transition matrices and characterize the optimal average expected rew...

متن کامل

Perceptive Evaluation for the Optimal Discounted Reward in Markov Decision Processes

We formulate a fuzzy perceptive model for Markov decision processes with discounted payoff in which the perception for transition probabilities is described by fuzzy sets. Our aim is to evaluate the optimal expected reward, which is called a fuzzy perceptive value, based on the perceptive analysis. It is characterized and calculated by a certain fuzzy relation. A machine maintenance problem is ...

متن کامل

Q learning with finite trials

The standard reinforcement learningmodel is powerful enough to deal with never ending trials. By slightly discounting rewards obtained in the future, an infinite walk in the environment is still guaranteed to have a finite expected future reward. This however comes at a price. The discounting may corrupt estimates of the expected return in ending trials. Also in most cases algorithms that can d...

متن کامل

A Genetic Search In Policy Space For Solving Markov Decision Processes

Markov Decision Processes (MDPs) have been studied extensively in the context of decision making under uncertainty. This paper presents a new methodology for solving MDPs, based on genetic algorithms. In particular, the importance of discounting in the new framework is dealt with and applied to a model problem. Comparison with the policy iteration algorithm from dynamic programming reveals the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005